Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network

نویسندگان

چکیده

In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) waveform generative model, which applies (QP) structure to (PWG) model using pitch-dependent dilated convolution networks (PDCNNs). PWG is small-footprint GAN-based raw whose generation time much faster than real because of its compact and non-autoregressive (non-AR) non-causal mechanisms. Although achieves high-fidelity speech generation, the generic simple network architecture lacks pitch controllability for an unseen auxiliary fundamental frequency ($F_{0}$) feature such as scaled $F_{0}$. To improve modeling capability, apply QP with PDCNNs PWG, introduces information by dynamically changing corresponding $F_{0}$ feature. Both objective subjective experimental results show that QPPWG outperforms when scaled. Moreover, analyses intermediate outputs also better tractability interpretability QPPWG, respectively models spectral excitation-like signals cascaded fixed adaptive blocks structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Nonlinear Autoregressive Model with Exogenous Variables Neural Network for Stock Market Timing: The Candlestick Technical Analysis

In this paper, the nonlinear autoregressive model with exogenous variables as a new neural network is used for timing of the stock markets on the basis of the technical analysis of Japanese Candlestick. In this model, the “nonlinear autoregressive model with exogenous variables” is an analyzer. For a more reliable comparison, here (like the literature) two approaches of  Raw-based and Signal-ba...

متن کامل

Non-melanoma skin cancer diagnosis with a convolutional neural network

Background: The most common types of non-melanoma skin cancer are basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). AKIEC -Actinic keratoses (Solar keratoses) and intraepithelial carcinoma (Bowen’s disease)- are common non-invasive precursors of SCC, which may progress to invasive SCC, if left untreated. Due to the importance of early detection in cancer treatment, this study aimed...

متن کامل

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-...

متن کامل

Non radial model of dynamic DEA with the parallel network structure

  In this article, Non radial method of dynamic DEA with the parallel network structure is presented and is used for calculation of relative efficiency measures when inputs and outputs do not change equally. In this model, DMU divisions under evaluation have been put together in parallel. But its dynamic structure is assumed in series. Since in real applications there are undesirable inputs an...

متن کامل

a new type-ii fuzzy logic based controller for non-linear dynamical systems with application to 3-psp parallel robot

abstract type-ii fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. type-ii fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when the noise (as an important instance of uncertainty) emerges. during the design of type- i fuz...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3051765